Moments of a distribution

Are quantitative measurements of the shape of a distribution. The n-th moment of the probability density function f(x) about the value c is:

$$\mu_n=\int_{-\infty}^\infty (x - c)^n\,f(x)\,dx.$$

In [1]:
%pylab inline
import scipy.stats as stats


Populating the interactive namespace from numpy and matplotlib

In [2]:
mu, sigma = 0.0, 1.0
data = np.random.normal(mu, sigma, 10000)

In [3]:
plt.hist(data, normed=1,bins=20, facecolor='yellow', alpha=.5);

xval = np.linspace(-4, 4, 100)
rv = stats.norm(mu, sigma)
plt.plot(xval,rv.pdf(xval), color='#AA0000')
plt.ylabel('Probability'), plt.xlabel('Value');


1.- Mean

Where is the distrubtion centered.


In [4]:
np.mean(data)


Out[4]:
0.0096754113676674504

2.- Variance

What is the weight of the distribution.


In [5]:
np.var(data)


Out[5]:
1.0032090293677065

3.- Skew

How lopsided is the distribution. A distribution with a long tail on the left or to the right is skew.


In [6]:
stats.skew(data)


Out[6]:
0.01667972299948282

4.- Kurtosis

It is how thick is the tail and how sharp is the peak


In [7]:
stats.kurtosis(data)


Out[7]:
-0.0773123154951545

Percentile

A percentile is the percent of cases occurring at or below a score. The 25% percentile is also called Q1 and refers to the value at which 25% of observations are bellow a score. Q3 is the 75% percentile, and the median (or Q2) is the 50% percentile. The interquartile range (IQR) is the difference between Q3 and Q1, and it contains 50% of all observations. In box plots, whisker can be ploted as the most distal point that contains 1.5 times the IQR (Tukey style).

  • Q1 = 1st quartile = 25th percentile
  • Q2 = 2nd quartile = 50th percentile
  • Q3 = 3rd quartile = 75th percentile
  • Q4 = 4th quartile = 100th percent

In [8]:
Q2 = np.percentile(data, 50) # median
print(Q2, np.median(data))


(0.0045219730369875488, 0.0045219730369875488)

In [9]:
Q1, Q3 = np.percentile(data, 25), np.percentile(data,75)
IQR = Q3-Q1
print(IQR)


1.35644924056